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Abstract 

This paper is about the following question: How many riffle shuffles mix a deck of card for 
games such as blackjack and bridge? An object that comes up in answering this question is the 
descent polynomial associated with pairs of decks, where the decks are allowed to have repeated 
cards. We prove that the problem of computing the descent polynomial given a pair of decks 
is #P-complete. We also prove that the coefficients of these polynomials can be approximated 
using the bell curve. However, as must be expected in view of the #P-completeness result, 
approximations using the bell curve are not good enough to answer our question. Some of our 
answers to the main question are supported by theorems, and others are based on experiments 
supported by heuristic arguments. In the introduction, we carefully discuss the validity of our 
answers. 

1 Introduction 

1.1 Probability theory, card shuffling, and computational complexity theory 

The representation of a sequence of coin tosses as the binary digits of a real number appears in a 
1909 paper by Borel jSJ. This representation was a step towards the formalization of the notion of 
probability using measure theory. Card shuffling was another example that was discussed around 
that time by Borel, Poincare, and others. However, the analysis of card shuffling, unlike that of coin 
tosses, was not advanced very far at that time. Even as he noted that a large number of shuffles 
brings the distribution of a deck close to the uniform distribution, Renyi wrote in the 1960s that 
he would not deal with "what is meant by a large enough number of movements" |25| . Much is 
now known about the number of shuffles for mixing a deck of cards thanks to the development of 
the convergence theory of Markov chains by Aldous, Diaconis, and others |Hj. 

A notable result about riffle shuffles is due to Bayer and Diaconis The riffle shuffle, where 
a deck of cards is cut into two packets and cards are dropped from the two packets in some order, 
is the most common method of shuffling cards. For a certain model of the riffle shuffle, Bayer 
and Diaconis gave a complete analysis of riffle shuffles. This result assumes that all 52 cards are 
distinct and that each of the 52! possible permutations must be nearly equally likely for the deck 
to considered well mixed. 

However, in card games such as blackjack, the distinction between suits (a standard deck of 52 
cards has 4 suits with cards labeled A, 2, ... , 10, J, Q, K in each suit) is ignored. In other card games 
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such as bridge, all 52 cards are distinct, but there are only 4 players. In bridge, it is enough if 
each player receives a random set of 13 cards — the order in which those 13 cards are received by 
a player is inconsequential. Therefore, in some card games, not all cards in the standard deck are 
distinct, and in others, not all 52! permutations need to be nearly equally likely for the deck to be 
considered well mixed. The mixing times for such card games is the topic of this paper. 

The popularity of card games is a reason to study card shuffling, but it is not the only reason. 
Examples based on card shuffling are an inextricable part of the convergence theory of finite state 
Markov chains. One application of this theory is to theoretical computer science, and in particular, 
to the problem of computing the permanent of a matrix all of whose entries are or 1. 

Let A be an n X n matrix whose ijth. entry is Uij. Then its permanent is defined as 



where vr ranges over the permutations of {1, 2, . . . , n}. If the term under the summation in Hl.l|) were 
multiplied by ±1, with the sign being + for even it and — for odd vr, we would have a definition of the 
determinant of ^. In spite of this resemblance to the determinant, computing the permanent is much 
harder than computing the determinant. As every student of linear algebra knows, the determinant 
can be computed using 0{n^) arithmetic operations by carrying out Gaussian elimination or row 
reduction. But there is no known way of computing the permanent that is much better than using 
its definition (|l.lj) directly. The operation count for the direct method is more than n\. Even for 
n = 50, such an operation count is far out of the reach of today's computers. 

The permanent belongs to a class of counting problems known as It is also #P-complete 
which means that every problem in can be reduced to it in time that is polynomial in the 
size of the problem. There are many other ^^P-complete problems, some of them of considerable 
importance to trade and industry, but it is believed that there is no polynomial time algorithm for 
any of these problems. The P ^ NP conjecture is closely related to this belief. Proof or disproof 
of either conjecture would be a major advance in computational complexity theory. 

If it is only desired to approximate the permanent, rather than compute it exactly, Markov 
chains are of help. Jerrum, Sinclair, and Vigoda jl7j have devised a Markov chain, analyzed its 
convergence, and proved that the permanent can be approximated accurately with high probability 
in polynomial time. 

This relationship between Markov chains and computational complexity theory is inverted in 
our work. To find out the number of riffle shuffles that mix a deck of cards for blackjack, bridge and 
other card games, we need to be able to find the transition probability between two given decks, 
with some cards repeated, under a given number of riffle shuffles. There are simple formulas to 
pass back and forth between transition probabilities between two decks and a polynomial that will 
be called the descent polynomial for those two decks. We prove that given two decks, the problem 
of computing their descent polynomial is #P-complete. 

A graph of the coefficients of the descent polynomial looks strikingly like a bell curve for 
most pairs of decks. We prove a theorem showing that coefficients of the descent polynomial are 
approximated by the normal law in very general circumstances. The coefficients of the descent 
polynomial that we need to figure out the mixing times for games such as blackjack and bridge 
lie in the tail of the normal fit. Unfortunately, but not unexpectedly, the bounds on the normal 
approximation are not sharp enough in this region. The proof of T^P-completeness in Section 
4 suggests that these must be the coefficients that are hard to compute accurately, and indeed 
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Table 1: Table of total variation distances after 1 to 10 riffle shuffles for three scenarios. The 
meaning and validity of these numbers is discussed in the text. 

they are. It would be too simplistic to expect a probablistic technique useful for proving normal 
approximation to provide a way around a #P-complete problem. 

We find a way around using computations justified by heuristic arguments. The normal ap- 
proximation result helps in two ways — it suggests a probablistic approach to finding the descent 
polynomial given a pair of decks and our computations of the descent polynomial are guided by the 
form of the bell curve. We state some of our results and discuss the validity of our computations 
in the second part of this introduction. 

Apart from computational complexity theory and probability theory, the analysis of the mixing 
time for games such as blackjack and bridge is also connected to the theory of descents. The 
definition of descents of permutations is given in the next section and the central role of descents 
in the analysis of riffle shuffles will become clear. The systematic study of descents was begun by 
MacMahon |^. More recent references on this topic are Foata and Schiitzenberger [T^, Gessel 
and Reutenauer and Knuth |2nj . 

1.2 Theorems, experiments, and card games 

Each of the three lines of Tabled corresponds to a different scenario. In the first scenario, which was 
considered by Bayer and Diaconis there are 52 distinct cards and we want all 52! permutations 
to be nearly equally likely. In the second scenario (blackjack), the distinction between the suits is 
ignored and the source deck is assumed to be four aces on top of four 2s, on top of four 3s, and so 
on. We want all 521/4!^'^ possible permutations to be nearly equally likely. In the third scenario 
(bridge), all 52 cards are distinct but they are dealt to four players in cyclic order, and we only 
want the partition of the 52 cards to the four players to be random. 

The total variation distances in Table ^ ^-re a measure of how well mixed the deck is after 
a certain number of riffle shuffles. After 7 riffle shuffles the total variation distance is 0.334 for 
the Bayer-Diaconis scenario. The total variation distance is lesser after only 5 riffle shuffles for 
blackjack and after only 4 riffle shuffles for bridge. 

As already mentioned, we had to resort to careful computations and heuristic arguments to 
determine some of the numbers in Tabled How reliable are those numbers? How rigorous are the 
methods used to find them? We now answer these questions. 

Each number in the first line of Table ^ is given by a formula derived by Bayer and Diaconis 
[2]. The formula is quite simple to implement and the rounding errors in finding the numbers in 
the first line can be bounded easily. We are happy to accept the numbers in that line as theorems. 

The numbers in the second line of Tableware about blackjack. We prove that each blackjack 
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number has an error less than .001 with a probabiUty greater than 96%, or an error less than .01 
with a probability greater than 99.9996%. 

The proofs of those error estimates are valid only with an assumption, however. The numbers 
are generated after permuting the blackjack deck randomly 10 million times, and the proofs of the 
error estimates assume those permutations to be independent of each other. The issue is whether 
the computer generated pseudorandom numbers match the idealizations found in probability theory. 

Pseudorandom numbers have been studied and used extensively ^Hj. A specific example is 
given by the coupling from the past construction by Propp and Wilson [2l]. They prove that a 
sequence of random steps taken in a particular way is guaranteed to terminate at a state of the 
Ising model with a particular probability distribution. Implementations of that method, however, 
make do with pseudorandom numbers. The situation with our error estimates is similar. 

Finally, what of the bridge numbers found in the third line of Table ^ We prove no error 
estimates for these numbers. While the blackjack problem fits into a subclass of pairs of decks for 
which we can find the descent polynomial quickly [7j, the bridge problem does not. Considering 
that the problem of finding the descent polynomial given any pair of decks is ^T^P-complete and 
that "counting problems that can be solved exactly in polynomial time are few and far between" 
(see one begins to suspect that theorems that assert error bounds for the bridge numbers 

might be out of reach for a long time. 

The bridge numbers in Table Q were obtained using careful computations guided by heuristic 
arguments. These numbers have the same type of validity as the numbers produced in experimental 
physics and chemistry. The method for producing the bridge numbers has been checked for internal 
consistency in a variety of ways in Section 7, and the method is reported in enough detail to permit 
others to reproduce our results. The numbers are open to refutation, a quality of experimental 
results that has sometimes been emphasized 

2 A model of riffle shuffles 

The figure below shows a riffle shuffle of a deck of 5 cards numbered 1 through 5. 



1 4 4 (2) 

2 Cut I Riffle I (1) 

4 ~ 5 5 (2) 

5 3 3 (1) 



The first step in a riffle shuffle is the cut. In the picture above, the deck is cut below the third 
card to get two packets with 3 and 2 cards, respectively. The second step is the riffle. The packets 
are riffled by repeatedly dropping cards from the bottom of one of the two packets until a new 
shuffled deck is obtained. In the picture above, the second card to be dropped and the last card 
to be dropped are from the second packet and all others are from the first, as indicated by the 
parenthesized numbers on the right. 

In a riffle shuffle, the cut can be placed anywhere in the deck and the two packets can be riffled 
together in different ways. To model riffle shuffles, it is necessary to assign probabilities to the 
various ways of riffle shuffling a deck of n cards. 

In the model we use, a random riffle shuffle of a deck of n cards is obtained by first generating 
a sequence of n independent random numbers each of which is either 1 or 2 with probability 1/2. 
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1^= 1^= 1^=(1) 

2_===_ 6 6 (3) 

"3' - 3 ^= 3' '(2) 

4 — Cut 7 Riffle 7 (3) 

5 4 4 (2) 

6 8 8 (3) 

7 5— 5 (2) 

8 2' 2' '(1) 



Figure 1: A 3-shuffle of 8 cards. The numbers in parentheses near the right edge of the figure give 
the packet the card comes from. 

For the riffle shuffle depicted above, the corresponding sequence is given in parenthesis on the right. 
The cut must be placed such that the number of cards in the first packet equals the number of Is 
in the sequence, and the number in the second packet equals the number of 2s. During the riffle, 
a card must be dropped from the first or second packet to get the ith card in the shuffled deck 
according as the ith number in the sequence is 1 or 2. The riffle shuffle depicted above results from 
the sequence 2, 1, 1, 2, 1. 

This model can be extended to assign probabilities to a-shufftes, in which the deck is cut into 
a packets, for any positive integer a. Figure ^ depicts a 3-shuffle of 8 cards. The extension is as 
follows. First, consider a sequence of n random numbers each of which is uniformly distributed 
over the set {1, 2, . . . , 0} and independent of the n — 1 other random numbers. When the deck of n 
cards is cut into a packets, the number of cards in the pth packet must be equal to the number of 
ps in the random sequence. If the number in the ith position of the random sequence is p, then the 
card that ends up in that position in the shuffled deck must be dropped from the pth packet. The 
random sequence that corresponds to the 3-shuffle depicted in Figure Q is 1,3,2,3,2,3,2,1. From 
here onwards, the phrases riffte shuffle and 2-shuffle are used interchangeably. 

In this model of the a-shuffle, the pth packet will be empty if p does not appear at all in the 
random sequence. If every instance of p precedes every instance of p + 1 in the random sequence, 
then all cards from the p + 1st packet must be dropped before any card is dropped from the pth 
packet. An a-shuffle has a — 1 cuts, and when some packets are empty, some of these cuts fall 
between the same positions. 

According to Bayer and Diaconis 3!, this model was described by Gilbert and Shannon in 1955 
and independently by Reeds in 1971. It is also described by Epstein who assumes a = 2 and 
calls it the amateur shuffle. It is sometimes called the GSR-model, after three of the people who 
proposed it. Many aspects of this model became clear only with the work of Bayer and Diaconis. 

One of the descriptions of this model of the 2-shuffle given by Bayer and Diaconis 3^ is as 
follows. A deck of n cards is cut after the A;th card with probability ^{j^] in other words, the cut 
is binomial. During the riffle, if the first packet has a cards and the second packet has b cards, the 
next card to be dropped is from the bottom of the first packet with probability a/(a + 6), and from 
the bottom of the second packet with probability b/{a + h). 

Every a-shuffle is a rearrangement of n cards and can therefore be thought of as a permutation 
vr of {1, 2, . . . , n}. If the ith card ends up in the jth position, then 7r(i) = j. The number of descents 
of a permutation turns out to be an essential concept and we will explain it in three different ways. 
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Figure 2: Depiction of the permutation 7r(l), 7r(2), . . . , 7r(8) = 1,8,3,5,7,2,4,6, which has 2 de- 
scents. Arrows that originate at i and i + l on the left intersect if i = 2 or i = 5. If this permutation 
is to be reahzed as an a-shuffie, cuts must be placed in the two positions shown in Figure 1. 

• Firstly, the number of descents of vr is equal to the number of positions i, 1 < i < n — 1, with 
7r(i) > Tr{i + 1). If we walk along the sequence 7r(l),7r(2), . . . ,7r(n) from beginning to end, a 
descent must be recorded every time there is a decrease in value. 

• For the second explanation, the permutation vr must be represented as shown in Figure l2j 
The numbers 1 to n are listed twice with the listing on the left corresponding to positions 
of the unshuffled deck, or the source deck, and that on the right to positions of the shuffled 
deck, or the target deck. If 7r(i) = j, an arrow is drawn that originates at the i on the left 
and terminates at the j on the right. The number of descents is equal to the number of pairs 
of arrows that originate at consecutive positions i and i + 1 and cross each other. 

• For the third explanation, we realize the permutation vr as an a-shuffle for some a. If vr is 
depicted as shown in Figure HI each packet is a block of contiguous cards on the left, and 
the arrows coming out of the same packet or block may not cross each other. A cut must be 
placed wherever two arrows that originate at consecutive positions cross each other. Therefore 
the minimum number of cuts necessary to realize vr as an a-shuffle is equal to the number of 
descents in vr, and a equals the number of cuts plus 1. 

Bayer and Diaconis [H] proved that the probability that an a-shuffle results in a permutation tt, 
with des(7r) = d, is given by 



What we need to understand is the effect of repeated 2-shuffles of a deck. The above formula alone 
is not enough. The result stating that an a-shuffle followed by a 6-shuffle is equivalent to an ab- 
shuffle, proved by Aldous P, Bayer and Diaconis [3], and Reeds, is also needed. With that result, 
it follows immediately that k 2-shuffles are equivalent to a single 2'^-shuffle. The probability that k 
2-shuffles result in a permutation vr can then be found using (|2.1j) . In addition, l|2.1j) tells us that 
the transition probability can take on only n different values corresponding to d = 0, 1, . . . , n — 1 
for a fixed. 




(2.1) 
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3 Decks with repeated cards and the descent polynomial 



In this paper, the term deck refers to an ordered sequence of cards. Let -D be a deck with cards 
labeled 1,2, ... ,h. If the number of cards labeled 1 is ni, the number labeled 2 is n2, and so on, 
the total number of cards in L) is n = ni + 712 + • • • + Uh- For example, the deck D = 1,2, 1, 2, 1, 1 
has ni = 4, ?i2 = 2, and n = 6. The deck 

D = 



which we will abbreviate as 1"^ , 2"^ , • • • , /i"* , has rii cards labeled 1 above n2 cards labeled 2 and 
so on. The deck D = (1, 2)"" has n = 2nQ cards with cards labeled 1 and 2 alternating. Given D, 
D{i) denotes the label of the card in the ith position in D. For example, \i D = 1, 2, 1, 2, 1, 1 then 
L>(1) = 1 and L>(4) = 2. 

Normally, the cards of a deck will be listed from left to right, with the label of the topmost card 
appearing first in the list, and with the labels separated by commas, as in the previous paragraph. 
However, sometimes the commas will be omitted. 

If a permutation vr is applied to a deck D = ei, . . . , e„, it sends the card Cj in position i to 
position vr(i). Therefore the resulting deck is e^-i(i), e^-i(2)i • • • where vr"-"^ is the inverse 

of the permutation vr. 

Let Di and D2 be decks of n cards, ric of which are labeled c for 1 < c < h. We say that a 
permutation tt of {1, 2, . . . , n} belongs to the set of permutations from Di to D2 if Di{i) = D2{'k{i)) 
for 1 < i < n. That set will be denoted by Tl{Di; D2). It includes all the permutations which when 
applied to Di result in D2 and only those. For example, if Di = 1,1,2,2 to D2 = 1,2,2,1, 
n(_Di;L>2) has 4 members, given by 7r(l), 7r(2), 7r(3), 7r(4) equal to 1,4,2,3 or 1,4,3,2 or 4, 1,2,3 or 
4, 1, 3, 2. In general the cardinality of Tl{Di; D2) is ni\n2\ . . . rihl. 

The descent polynomial of Il(Di; D2), the set of permutations from Di to D2, is defined as 
^^^des(7r)^ where vr ranges over the set Il{Di; D2) and des(7r) is the number of descents of vr. Let 
the descent polynomial be 

Co + CiX + • • • + Cn-ix"-^ (3.1) 

The coefficient Cd equals the number of permutations in Il{Di; D2) with d descents. The probability 
that an a-shuffle of D\ results in the deck D2 is therefore given by 

a formula obtained by using (|2.H) and summing over all the permutations in n(Z)i;Z)2). Setting 
a = 1,2, ... ,n gives a triangular system of equations for pi,p2, ■ ■ ■ ,Pn in terms of the coefficients 
Co, ci, . . . , Cn-i. This triangular system can be inverted using a binomial identity (see |JJ:, p. 269]) 
to get 

+ . /, .■.^/'n + l\ , , _„fn + l 



cd = Pci+i{d + iT-Vdd^[^ ^ j+Pd-i{d-iT[^ 2 J -••• + (-i)>ii"| ^ J, (3.3) 

for 1 < d < n. Using (|3.2j) and (|3.3j) . it is easy to pass back and forth between the transition 
probabilities pa and the descent polynomial (|3.1|1 . 
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Suppose we are given a source deck Di and asked to determine how many riffle shuffles mix that 
deck. We will examine a definition of mixing later on, but surely we need to be able to determine the 
transition probability from Di to any rearrangement of its cards under an a-shuffle. Since it is easy 
to pass back and forth between the transition probabilities and the descent polynomial, we need to 
be able to determine the descent polynomial of the permutations from Di to any rearrangement of 
it. This begs the question, given decks Di and D2 is it possible to compute the descent polynomial 
of permutations from Di to D2 efflciently? To answer this question, we take a trip through 
computational complexity theory. 

4 An excursion to computational complexity theory 

We begin with a decision problem seemingly unrelated to our concerns: 

THREE DIMENSIONAL MATCHING (3DM): Finite sets X = {xi, . . .,x^}, Y = {yi, . . . 
and Z = {zi, . . . , Zm} of equal cardinality are given. A subset T of X xY x Z is also given. Decide 
if there exists a subset M of T of cardinality m such that every element of X occurs exactly once 
as the first element of a triple in M, every element of Y occurs exactly once as the second element 
of a triple in M, and every element of Z occurs exactly once as the third element of a triple in M. 

3DM belongs to the class of decision problems known as NP. Karp showed a way to reduce 
every problem in NP to 3DM in time polynomial in problem size and thus proved 3DM to be NP- 
complete. Karp proved the NP-completeness of a number of other decision problems as well; his 
paper is a cornerstone of computational complexity theory. The P 7^ NP conjecture implies that 
there is no algorithm for 3DM whose running time is polynomial in m. Finding a polynomial time 
algorithm or a super-polynomial lower bound for a single NP-complete problem suffices to resolve 
this conjecture since all NP-complete problems can be reduced to each other in polynomial time. 
Kozen's book has a good introduction to some of the basic topics of computational complexity 
such as NP-completeness and #P-completeness. 

The following decision problem is related to card shuffling: 

MIN CUTS: Given two decks, Di and D2, and a positive integer d, decide if there exists a permu- 
tation vr € Il{Di;D2) with d or fewer descents. 

To determine the transition probability from Di to D2 under an a-shuffle using (|3.2|) . we need 
to know the coefficients Cj of the descent polynomial (|3.H) . MIN CUTS asks if one of the coefficients 
Ci, < i < d, is nonzero for a given d. 

Another decision problem related to card shuffling is the following: 

RIFFLE: Given p nonempty packets of cards Pi, P2, ■ ■ ■ , Pp and a deck D, decide if it is possible to 
riffle the packets and get D. 

Each packet of cards Pi is a set of cards with labels placed one above the other. We may also 
call it a deck, although we do not. All the cards in all the packets will appear somewhere in the 
new deck. RIFFLE can also be worded in terms of sequences and subsequences. RIFFLE has a 
solution if and only if each packet Pi corresponds to a subsequence of D, such that each packet 
equals the corresponding subsequence and every card of D appears in exactly one subsequence. 

Both RIFFLE and MIN CUTS are NP-complete. To prove that, we give polynomial time 
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many-one reductions from 3DM to RIFFLE and from RIFFLE to MIN CUTS. 

3DM reduces to RIFFLE. Suppose we are given an instance of 3DM. It can be reduced to 

RIFFLE as follows. The elements Xi,yi,Zi of X,Y,Z become card labels and we assume X,Y,Z 
to be pairwise disjoint with no loss of generality. For every triple (xj, yj, Zk) G T create the packet 
Xi,yj, Zk, L, where L is a new card label. For example, if {x3,yl,z2) G T, we create the packet 
x3, 2/1, z2, L. If there are t triples in T, we create t packets totally in the instance of RIFFLE. The 
deck D for this instance of RIFFLE is given by 

n — r „ ^, ^ jm ai am /3m ~7i ^7m T/n* 

u — ■'^X-, ■ ■ ■ ■, -t^mi y\i ■ ■ ■ 1 ym-i '^Xi ■ ■ ■ 1 '^mi ^ ) ■^'l ^ • • • ■> -^m i ill t • • • i ilm ! ^1 j • • • > i j 

where a^, (3i, and 7^ are one less than the number of occurrences of Xj, and Zi as elements of 
triples in T , respectively, but if the number of occurrences is zero those numbers must also be zero. 
Additionally, m* = max(0, i — m), where t is the number of triples in T. Wc claim that the given 
instance of 3DM has a matching M if and only if the t packets in the created instance of RIFFLE 
can be riffled to get D. 

Suppose the matching M is a solution of 3DM and (xj, yj, z^) G M. Then the card labeled L in 
the packet Xi, yj, Zk, L must be dropped to get one of the m Ls that occur as the first block of Ls 
in D, and the cards Xi, yj, Z}^ must be dropped from this packet to get the cards with those labels 
that precede the first block of Ls in D. If the triple {xi,yj,Zk) belongs to T but not to M, the 
card labeled L in the packet Xi, yj,Zk,L must be dropped to get one of the m* Ls that occurs at 
the very bottom of D and the cards Xi, yj, z^ must be dropped from this packet to get cards with 
those labels that follow the first block of Ls in D. We can riffle the packets in this way to get D. 
Suppose the instance of RIFFLE that was created from the given instance of 3DM has a solution. 
Consider the m packets whose Ls are dropped to get the m Ls that occur as the first block of Ls 
in D. The triples that correspond to these packets must form a matching M of 3DM. The proof of 
validity of the reduction from 3DM to RIFFLE is now complete. 

The reduction from 3DM to RIFFLE uses cards with 3m + 1 different labels. What if cards are 
not allowed to have arbitrarily many labels? We modify the reduction to use only cards with the 
four labels [, ], c, and L. The modification is to replace each Xi that occurs in a packet or in the 
deck D by the list of cards [c*]; each yi by [c™"'"*]; and each Zi by [c^"^"*"*]. The opening and closing 
brackets [ and ] that occur in D and in each of the packets are properly matched with no nesting. 
Therefore, in any solution of RIFFLE, if a packet drops a [ to get the very last or bottommost [ 
that occurs in D, that dropped [ must be the bottommost [ in that packet and the same packet 
must have dropped the matching ] to get the bottommost ] in D. By induction, we conclude 
that matching parentheses [ and ] in a packet must be dropped to get matching parentheses in D. 
Therefore the list of cards that code for Xi or y^ or Zi in any packet must be dropped all at once to 
get a contiguous sequence of cards in D. The argument for the validity of the earlier reduction can 
now be applied to its modification. A little thought will convince the reader that cards labeled L 
in this reduction can be replaced by cards labeled c. We have proved that RIFFLE is NP-complete 
even if the cards are allowed to have only three different labels. 

RIFFLE reduces to MIN CUTS. RIFFLE can be reduced to MIN CUTS as follows. Given an 
instance (Pi, . . . , P^; D) of RIFFLE, consider the decks 

L*! = PiLP2L...Pp 
D2 = DLP-\ 
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Figure 3: A graph of the coefficients of a descent polynomial looks like the bell curve. 



where L is a new label that does not occur in the given instance of RIFFLE. The instance of MIN 
CUTS uses these decks Di and D2 and chooses d = p — 1. We claim that RIFFLE has a solution 
if and only if there exists a permutation in H{Di] D2) with p — 1 descents. For a proof, it suffices 
to note that any permutation in Il{Di; D2) must cut Di after each of the p — 1 Ls in Di as all the 
Ls in D2 occur at the bottom. This reduction uses only a single new label. We have now proved 
the following theorem. 

Theorem 4.1. MIN CUTS is NP-complete. MIN CUTS remains NP-complete even if cards are 
allowed only four different labels. 

The problem of finding the descent polynomial p. If) of Ii{Di; D2) is the counting version of MIN 
CUTS since the coefficient Cd equals the number of permutations in Tl{Di] D2) with d descents. 
Once the definition of #P-completeness in terms of Turing machines is understood, it will be clear 
that the proof of Theorem l4.H with slight modifications, implies that finding the descent polynomial 
is #P-complete even if only four different labels are allowed. We conjecture that finding the descent 
polynomial is #P-complete even if Di and D2 have cards with only two different labels. 

If all cards in Di are distinct, there can be only one permutation vr in Tl{Di] D2) and the descent 
polynomial can be easily found. There are other interesting cases where the descent polynomial 
can be found efficiently. If either Di or D2 is a deck where all cards with the same label occur in a 
sequence of contiguous positions, there is an efficient algorithm for finding the descent polynomial. 
See [Zj. There may be yet more cases. 

There is an algorithm of time complexity 0(n^^) for determining the coefficient of x'^ in the 
descent polynomial of permutations from a given deck Di to another given deck D2. A detailed 
description of this algorithm will appear in the first author's Ph.D. thesis. As the running time of 
this algorithm is exponential in d, its use to determine the mixing times for games like blackjack 
and bridge is impractical on today's computers. 
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5 A brief introduction to Stein's method of auxiliary randomiza- 
tion 



As the task of finding the descent polynomial is T^P-complete and therefore unlikely to have an 
efficient algorithm, it is necessary to find a way to approximate the descent polynomial. 

To approximate the descent polynomial, we turn to Stein's method of auxiliary randomiza- 
tion. Stein's method can be used to approximate some probability distributions by the normal 
distribution or the Poisson distribution or the binomial distribution. 

Graphs of coefficients of most descent polynomials look like bell curves, as shown in FigureOland 
as proved in the next section. Although the normal approximations give us a good sense for what 
these polynomials look like, the approximations are not accurate for the first few coefficients of the 
descent polynomial. We will need those coefficients with good relative accuracy, and in Section 7, 
we obtain usable approximations for those coefficients based on heuristic arguments. The methods 
of Section 7 are guided by normal approximation results. 

Let Xi,X2, ■ ■ ■ ,Xn be a sequence of independent and identically distributed random variables 
with P(A'j = 0) = P(A'j = 1) = 1/2. By elementary probability theory, the distribution of the sum 
S = Xi + X2 + • • • + Xn is binomial and can be approximated by the standard normal distribution 
after suitable normalization. We use this simple example to give a brief introduction to Stein's 
method j^H] |2Z1- The introduction is broken down into three steps. 

Characterization of the standard normal distribution. The standard normal distribution 
has the probability density function (l/\/27r) exp(— Another characterization would be in 
terms of its moments. There are ways to prove central limit theorems using these characterizations 
of the normal distribution, but neither is used by Stein's method. The characterization of the 
normal distribution used by Stein's method is given in the lemma below. 

Lemma 5.1. A random variable W has the standard normal distribution if and only if the expec- 
tations 'E(W f{W)) and E(/'(M/^)) are equal for all bounded continuous functions f whose first and 
second derivatives are bounded and piecewise continuous. 

The proof in one direction is a simple use of integration by parts. The proof in the other 
direction is also elementary, but more involved. Stein ,27. chapter 2] has given a proof of a much 
refined version of this lemma. 

For the sum S, fi = F,S = n/2 and cr^ = Var(S') = n/4. We want to show that the distribution 
of T = (S* — [i)l\fo is close to normal. The expectations E(r/(T)) and E(/'(r)), for functions 
/ as in Lemma l5.ll will not be exactly equal, since the distribution of T is not exactly normal, 
but if it is possible to show that the expectations are close to one another then there might be a 
way to show that the distribution of T is close to normal. But how to show that the expectations 
E(r/(r)) and E(/'(T)) are close? The answer is to proceed by throwing some extra randomness 
into the problem and then use the familiar Taylor series expansion. 

Auxiliary randomization. There are many ways to add extra randomness with a view to showing 
the expectations E(T/(T)) and E(/'(r)) to be close to each other. The way described here is related 
to size biasing. 

To explain size biasing, we consider a random variable W with a continuous density function 
p{x) and finite expectation. Assume that > or p{x) = for x < 0. Then EW = xp{x)dx. 
Therefore xp{x) /EW is also a probability density function. A variable with that density is said to 
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be VF-size biased. More generally, W* is said to have the W-size biased distribution ifE{Wf{W)) = 
FiWFj{f{W*)) for all continuous / for which the expectation on the left hand side exists. The size 
biased distribution is defined only for non- negative random variables. If is a 0-1 valued random 
variable which is 1 with a positive probability, its size biased distribution assigns probability 1 to 
the value 1. 

The size biasing of sums such as S above is facilitated by the following lemma [2]. 

Lemma 5.2. Let W = Xi + X2 + - ■ ■ + Xn where the Xi are all 0-1 valued random variables, but not 
necessarily independent or identically distributed. Let L be a random variable which is independent 
of the Xi and which satisfies P(/ = i) = KXi/FiW. Let X*,X2, ■ ■ ■ ,X* be a sequence of random 
variables such that Xj = 1 and 

P{{XlX;,...,X:,)GA\l = i)=F{{Xi,X2,...,Xr,)GA\Xi = l), 

for all possible sets A. Then W* = X^ + X| -|- • • • -1- X* has the W-size biased distribution 

Proof. The sequence of equalities below proves the lemma. To justify the second equality below, 
note the assumption about the conditional distributions of the X* and the Xi in the lemma. For 
the fourth equality below, note that EXj = P(Xj = 1) since Xi is 0-1 valued. 

n 

^f{W*) = J2HfiW*)\l = i)F{L = i) 

1=1 

n 

= ^E{f{W)\X, = l)P{l = i) 
1=1 

n 

= Y,miW)\X, = 1)EX,/EW 

i=l 

1=1 

EW ^ V *M 

i=l 



□ 



Recall that the random variable S was defined to be a sum of independent and identically 
distributed random variables Xi, . . . , X„. Let / be independent of the Xi and uniformly distributed 
over the set {1, 2, . . . , n}. Define X* = Xi if i ^ I and Xj = 1. Now the hypotheses of Lemma 15.21 
are easily verified and we may assert that the random variable S* = X^ -|- • • • -|- X* has the S'-size 
biased distribution. 

The random variable S* was built up using the Xi and some extra randomness, namely the 
random variable /. Taylor series expansion must be used to see how having S* around helps in 
proving the distribution of S to be normal. 

Taylor series expansion. Let PV^ be a non-negative random variable. Let W* have the T^-size 
biased distribution. Assume fi = EW and = Var(H^) = E{W — /i)^. We want to compare the 
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distribution of (W—^i)/a with the standard normal distribution. Consider the following calculation. 



E 



a 



a 



a \ a 



E/ 



a 



_ f^J {W*-W) ^J W-f, ' 
ay a V ^ . 



a 



a 



where U is uniformly distributed over [0, 1] and independent of all other random variables. In 
the last line above, the remainder term from Taylor's theorem, which is usually written as an 
integral, is written as an expectation by using U . A simple calculation shows that E(PF* — W) = 
(EVF^/ETy) — EW = o"^/^. This calculation makes it plausible that the first term on the right 
hand side of the last equality above could be close to E/'((T^ — n)/a). The second term will be 
small only if W* — W is small. This is an important requirement. Not only must W* have the 
VF-size biased distribution, the joint distribution of W and W* must be such that W* is close to 
W. 

Estimates carried out after Taylor series expansion imply the following theorem. It is due to 
Goldstein ^1]. The size biased version of Stein's method was derived by Baldi, Rinott and Stein 

m 

Theorem 5.3. Let W be a non-negative random variable with FjW = fi and Var(H^) = o"^. Let 
W* be jointly defined with W such that its distribution is W-size biased. Let \ W* — W\ < B and 
let A = B/a. Let B < cr^/V^/S/I. Then 



W 



l^ 



< X 



$(x) 



< 0AA + -{64A^ 
a 



+ + 



23// 



Yai {E{W* -W\W)), 



where ^ is the standard normal distribution. 



Using Theorem 15.31 we can prove that the distribution of {S — fi) ja is close to the normal 
distribution. We state again that \x = ES = n/2 and <t^ = Var(S') = n/4. From the construction of 
S*,\S* -S\<BiovB = l. Given 5, S^-S" = 1 if I = i and X, = 0. This happens with probability 
{n-S)/n. Otherwise, S*-S = 0. Therefore, E(5*-5|5) = 1- f and Var(E(5* -5|5)) = l/(4n). 
By Theorem 15. 3| 



S- 



< X 



a 



<I>(x) 



< 



C 



n 



for some constant C. 



6 Normal approximations for descent polynomials 

A number of normal approximation theorems for descents and inversions of permutations of sets 
and multisets can be proved using the size biased version of Stein's method In this section. 
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we obtain a normal approximation for the coefficients of the descent polynomial p.lj) in a quite 
general setting. We use the technique developed in [B]. 

Let vr be a uniformly distributed permutation in Il{Di; D2), where D2 is obtained by rearranging 
the cards of Di in some order. Let the decks have ric cards with label c for 1 < c < h and let the 
total number of cards be n. The random variables Xi,X2, ■ ■ ■ , Xn-i are defined as follows: Xi = 1 
if 7r(i) > Tr{i + 1) but Xi = otherwise. Let W = des(7r) = Xi + X2 ^ h Xn-i- Then 

PiW = d) = ^^ ^, (6.1) 

ni\n2\ ...nh\ 

where Cd is the coefficient of the x'^ term of the descent polynomial (|3.1|) . If an approximation for 
the distribution of W is available, (|6.1() can be used to approximate q. 

We will construct W* so that its distribution is VF-size biased. Other constructions of this type 
can be found in 0. Let / be a random variable independent of vr with P(/ = i) = ^Xi/'EW for 
1 < i < n — 1. We assume EVF > so that / is well defined. If c and e are two distinct card labels 
it is useful to define the following set: 

S{c,e) = {{x,y)\x > y,D2{x) = c,L>2(y) = e}, (6.2) 

where D{i), as noted earlier, stands for the label of the ith card in the deck D. Note EXj = 
\S{c,e)\/{ncne) if -Di(i) = c, Di{i + 1) = e and c 7^ e, but EXi = 1/2 if Di{i) = Di{i + 1). 

Given vr and /, we define a permutation vr*, with tt* G n(Di;I?2)- If ''^{I) > '^{^ + 1); then 
vr* = TT. If tt{I) < tt{I + 1) and Di{I) = Di{I + 1), then 7r*(/) = vr(/ + 1), tt*{I + 1) = tt{I), and 
vr*(i) = 7r(i) for i / /,/ + 1. The remaining case is vr(/) < vr(/ + 1), Di{I) = c, Di{I + 1) = e, 
and c 7^ e. Let J be a random variable independent of vr and /, and uniformly distributed over the 
set ^(c, e) of H6.2() . Let J = {x,y). Consider the list 7r(l), vr(2), . . . ,vr(n — 1). Exchange vr(/) and 
x and exchange 7r(/ + 1) and y to get a new list. The permutation vr* from Di to D2 is defined by 
setting vr*(l), vr*(2), . . . , vr*(n — 1) equal to this new list. 

There is another way to describe vr* in the last case above, which is vr(/) < vr(/ + 1), Di{I) = c, 
Di(I + 1) = e, and c 7^ e. Define 

5^(c,e) = {(A:,/)|(vr(A;),^(0) € 5(c,e)}. (6.3) 

We can pick (A;,/) uniformly from this set, exchange vr(/) with ^{k), and exchange vr(/ + 1) with 
vr(/) to get vr*. The set over which {k,l) is distributed depends upon vr and /, but the distribution 
is always uniform. 

Lemma 6.1. P(vr* G A\l = i) = P(7r G A|vr(i) > vr(i + 1)) for any possible set A. 

Proof. If / = i and vr(i) > 7r(i + 1), then vr* = vr. Therefore it suffices to show that 

P(vr* G A\l = i,TT{i) < vr(i + 1)) = P(vr G ^|vr(i) > vr(i + 1)). 

The first case is when Di[i) = Di{i + 1). In this case, we have 

P(vr* G^|/ = i,vr(i) < vr(i + 1)) 
=P(vr(l),...,vr(i-l),vr(i + l),vr(i),vr(i + 2),...,vr(n-l) G^|/ = i,vr(i) < vr(i + 1)) 
=P(vr(l), . . . , vr(i - 1), vr(i + 1), vr(i), vr(i + 2), . . . , vr(n - 1) G ^|vr(i) < vr(i + 1)) 
=P(vr G ^|vr(i) > vr(i + 1)) 
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The last equality above holds because if we pick a uniformly distributed permutation in n(L>i; D2) 
and exchange 7r(i) and 7r(i + 1), the new permutation is also a uniformly distributed permutation 
in n(L»i; L>2) provided Di{i) = Di{i + 1). 

The second case is when Di(i) = c, Di{i + 1) = e, and c / e. Below each of the summations is 
taken over {x,y) £ S{c,d). 

P(7r* G A\I = i,7r{i) < 7r(i + 1)) 
= ^P(7r* e A|/ = i,7r(i) <4i + l),J= (x,y))P(J= (x,y)) 

= ^ P(7r* G A|/ = i, Tr{i) < 7r(i + 1), J = (x, y))P(7r(i) = x, 7r(i + 1) = y |7r(i) > 7r(i + 1)) 
= ^ P(7r e A|7r(i) = X, 7r(i + 1) = y)P(7r(z) = x, ir{i + 1) = y|vr(i) > 7r(i + 1)) 
=P(7r G A|7r(i) > 7r(i + 1)). 

The second equality above holds because J is uniformly distributed over the set S{c, d) of (|6.2|) 
and because tt is a uniformly distributed permutation in Il(Di; D2). The third equality above 
holds because vr is a uniformly distributed permutation in n(Di; D2) and because of the way vr* is 
generated using vr, /, and J. □ 

For i = 1,2,... ,n - 1, define X* = 1 if iT*{i) > TT*{i + 1), but X* = otherwise. Let 
W* = des(7r*) = X^ + X^ + ■ ■ ■ + X*_^. 

Lemma 6.2. W* has the W-size biased distribution. 

Proof. Follows from Lemmas 15.21 and 16.11 □ 

If Theorem 15. ^-il is to be applied to approximate the distribution of W, it is necessary to find an 
upper bound for Yav{E{W* - W\W)). By 111 p. 477], Var(E(VF* - W\W)) < Var(E(VF* - W\Tr)), 
since is a function of tt. Let 



holds because Var(y) = EY"^ - (EY)^ and Covar(y,Z) = EYZ - EYEZ. Along with dH^l), the 
following Lemma 16.31 is useful for upper bounding Yai{Q). 

The argument to upper bound Yar{Q) simplifies a great deal if it is assumed that cards with 
any label occur the same number of times in Di or D2; in other words, ni = n2 = ■ ■ ■ = = uq 
with no > 1. We make this assumption from here onwards and put it to use immediately in the 
lemma below. With this assumption, the total number of cards is n = Kuq. 

In the lemma below, as will become evident from its proof, the constants 7 and 10 can be 
replaced by other positive integers. We take the constants as 7 and 10 to facilitate later use of the 
lemma. The constants were chosen to simplify the exposition and are not the best possible. 

Lemma 6.3. Let tt be a uniformly distributed permutation in Il{Di]D2). Let f{iT) depend only 
upon the relative order 0/ 7r(zi), 7r(i2), . . . , vr(Zf.). Let g('ir) depend only upon the relative order of 
7r(ji), 7r(j2), • • • >7r(js). Assume \ f\ < 7, \g\ <7,r< 10, and s < 10. There are three cases: 



Q = E{W* -WI-k). 



(6.4) 



We will upper bound Var(Q). 
The identity 




(6.5) 
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1. {n,...,v}n{ji,...,js} 7^0, 



2. {ii, . . . , n {ji, . . .,js} = (j), but Di{ip) = Di{j^) for some 1 < p <r and 1 < ip < s, 

3. {n, . . . , v} n {ii, . . . = 4), and Z)i(ip) / Di{j^) for any 1 < p < r and l<ip<s. 
In these three cases, we have 

1. Covar(/, (jr) < C for some constant C that depends on neither uq nor h, 

2. Covar(/, (jf) < C/uq for some constant C that depends on neither no nor h, 

3. Covar(/,5) =0, 
respectively. 

Proof. For the first case, it is enough to note that / and g are bounded in magnitude. For the third 
case, it is enough to note that / and g are independent of each other. The second case remains to 
be proved. 

It is enough to consider / and g to be indicator functions that are 1 for a particular relative 
ordering and for others. This is because any f oi g that is bounded in magnitude by a constant 
can be written as a linear combination of a constant number of such indicator functions with 
coefficients that are bounded in magnitude by a constant. We show the proof assuming / = 1 if 
7r(i) < Tr(i + 1) and / = otherwise; and g = 1 if 7r(j) < 7r(j + 1) and g = otherwise. For the 
second case to apply, either i + l<jorj + l<i must hold, and at least one of Di{i), Di{i + 1) 
must equal one of -Di(j), Di{j + 1). 

Let Pi = P(7r(i) < 7r(i + 1)) and P2 = P(vr(j) < 7r(j + 1)). We claim that 

F{TT{j) < TT{j + l)\TT{i) = x,Tr{i + 1) = y) =P2 + e, (6.6) 

with |e| < A/no. In (|0|> . I < x,y < n, x ^ y, D2{x) = Di{i), and D2{y) = Di{i + 1). The 
claim is true because of the following argument. If Di{j) = Di{j + 1), the pair (vr(j),7r(j + 1)) 
can take no (no — 1) different values. Otherwise, it can take Uq values. For some of these values, 
7r{j) < 7r{j + 1). Given 7r(i) = x and 7r(i + 1) = y, at least one of x or y is not allowed to appear in a 
possible value for (vr(j), 7r(j + 1)). Thus at most 2no possible values for this pair must be excluded. 
The proof for this / and g can be completed by noting that Covar(/, 17) equals 

P(7r(i) <^(i + l),7r(i) <7r(j + l))-P(7r(i) <7r(i + l))P(7r(j) <^(j + l)), 

and by writing the first of the three probabilities in the line above in terms of the conditional 
probabilities on the left hand side of (|6.6j) . The proof for more general / and g is similar. □ 

We now go back to Q defined by 1)6. 4() and the construction of the size biased random variable 
W* . Let us suppose that vr is a given permutation in Il{Di; D2). To write Q as a sum, we introduce 
the quantities Xirihi + 1) and Xirihi + ^7^,1), where k ^ I. The first of these is defined as the 
change in the number of descents when 7r{i) and 7r(i + 1) are exchanged. The second is defined as 
the change in the number of descents when 7r(i) is exchanged with Tr{k) and 7r(i + 1) is exchanged 
with Tr{l). With a view to applying Lemma 16.31 later on, we note that Xnii,i + 1) depends only 
on the relative order of at most 4 numbers, namely 7r(i — 1), 7r(i), 7r(i + l),'/r(i + 2). Similarly, 
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Xirihi + depends only on the relative order of at most 10 numbers of the form The 

magnitude of both of these quantities is always bounded by 7. 

From here onwards, we denote EVF and Var(VF) by fi and a"^, respectively. 

Let A be the set of values of i for which 7r(i) < Tr{i + 1) and Di(i) = Di{i + 1). Let B be the 
set of values of i for which 7r(«) < Tr{i + 1), Di{i) / Di{i + 1), and S{Di{i),Di{i + 1)) / (f>. We 
must have ACi B = (p. By the construction of W* using vr, / and J, if i G A, we have 

P{I = i) = l- and E{W* -W\tt,I = i) =xAhi + '^)- (6-7) 

Ui e B, let Di{i) = c and Di{i + 1) = e. We have, 

P{I = i) = - (6.8) 

and 

E{W* - W\7r, I = i) = ^E{W* - W\tt, I = i,J={x, y))P( J = {x, y)) 

_ sr^ X7r(i,i + l,fc,0 Qx 
|5(c,e)| ' ^'-'^ 

where the first summation is over all {x,y) in the set S{c,e) of 1)6. 2|) and the second summation is 
overall {k,l) in the set S^{c,e) of Q. Ui ^ Aandi ^ B, either P(I = i) = or E(l^* - W^|7r, / = 
i) = 0. 

By (jOl), (EUl), (EiHl) and (jESI, we have 

n — 1 



Q = ^E{W* - W\Tr,I = = i) 



i=l 

and 

^ v^X7r(i,i + l) ^ Xnii,i + l,k,l) 

i^Q = Z^ 2 + :^ • (6.10) 

i&A iGB and (fc,Z)e5^(Di{j),Di(i+l)) ^ 

We will use (|6.10|) with ()6.5() to upper bound Var(/i(5). 

If (|6.5)) and ()6.1())) are used to write Var(/x(5) as a sum of variance and covariance terms, each 
term on the right hand side of ()6.10|) will contribute a single variance term. The total contribution 
of these variance terms will be bounded by Cn, for some constant C, because each x-k is bounded 
in magnitude by 7. 

Some of the covariance terms will be of the form 

^ ( Xir{ii,k + l,ki,h) Xn{i2,i2 + l,k2,l2)\ .„..x 
Covar K , ^ , (6.11) 

where ii £ B and 12 € B. The value of Xirihih + 1,^17^1) depends only upon the relative order 
of the vr(z), with i equal to ii or zi + 1 or ki or li, or differing from one of those 4 integers by 
at most 1. There can be at most 10 such values for i and we will denote this set of values by 
nghdjii, ii + l,ki,li}. As for the magnitude of the covariance term ()6.11|) . there are three cases. 
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1. Suppose nghdjzi, ii + l,ki, Zi} nnghd{i2, «2 + 1> ^2, ^2} / This corresponds to the first case 
of Lemma inini Therefore the magnitude of the covariance term is bounded by C/hq in 
this case. 

To count the number of covariance terms of this type, note that ii can take at most n — 1 
different values. As we require Di{ki) = Di{ii) and Di{li) = Di{ii + 1), for given ii, there 
are at most possible values of For the covariance term 1)6. 11(1 to fall under this 

type, at least one of 12,^2, h must differ from one of «i, fci, li by less than 2. Therefore, given 

11, ki,li, there are at most a constant number of choices for one of ^2,^2, ^2- Having chosen 
one of ^2, ^2) ^2, there are at most Uq ways to choose the other two. For example, suppose /c2 
has been chosen. This restricts 12 to at most no possibilities since we require Di(i2) = Di{k2). 
Given ^2, there are at most no possibilities for I2 since we require -Di(^2) = Di{i2 + 1). Thus 
the number of covariance terms of this type is bounded by CnriQ for some constant C. 

2. Suppose nghd{ii,ii + nnghd{z2,i2 + 1,^25^2} = but = -01(^2) for some 
il G nghd{zi,ii + and some G nghd{i2,«2 + 1,^2,^2}- This corresponds to the 
second case of Lemma [6. ^-il Therefore the magnitude of the covariance term is bounded 
by C/uq for some constant C in this case. 

The number of such covariance terms is bounded by CnriQ. The argument is the same as that 
in the previous case except for one difference. Given ii,ki,li, in this case, we require one of 

12, k2,l2 to differ by 2 or less from some position i* such that identical cards occur at i* and 
at one of the positions in nghdjzi, ii + 1, ki,li} in the deck Di. Therefore, there are at most 
Cno possibilities for one of 12, k2, h and not just a constant number of possibilities as in the 
previous case. 

3. Suppose nghd{ii,ii + n nghd{i2,^2 + 1,^2,^2} = 4'j and Di{il) / Di{i2) for any 
i* G nghdjii, ii +l,ki,li} and any ^2 G nghd{i2, ^2 + 1, ^2, ^2}- This corresponds to the third 
case of Lemma 16.31 Therefore the covariance term ()6.11|) is in this case. 

Consequently, the total contribution of covariance terms of the form 1)6. 11(1 to Var(/iQ) is 
bounded by Cn for some positive constant C. Apart from (|6.11|) . covariance terms can also be 
of the form 



with ii £ A and 12 G A. The proof that the total contribution of such terms to Var(/iQ) is also 
bounded by Cn is similar to and simpler than the case that has already been dealt with. The 
bound on Var(/uQ) is stated lemma below. 

Lemma 6.4. Var(^Q) < Cn for some positive constant C. 

Theorem 6.5. Let Di be a deck of cards with h different labels with each label occurring no times. 
Let D2 be another deck with the same n = hnQ cards in a different order. Let tt be a uniformly 
distributed permutation in Il{Di;D2). Let the random variable W be the number of descents of n. 




with ii £ A and ^2 G B, or 



Covar(x,r(n,'ii + l),Xn{i2,i2 + 1)) 
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Let fi = EW and ct^ = Var(PF). Assume a > 72/3(6^)^3. Then 



< X - <I>(2;) 



<c(i + 4+4+4). 

\a o"^ J 



for some positive constant C and ^{x) = ex.p{—u^/2)du. 



Proof. By Lemma [6.21 W* has the T^-size biased distribution. By construction of W* , \W* — W\ < 
7. By m p. 477], Var(E(Ty* - W\W)) < Var(Q) with Q defined by The proof can be 

completed using Theorem 15.31 if Lemma 1^31 is used to note that Va,r{Q) < Cn/ji'^. □ 

Consider the atypical case where Di = (1,2)"''' and D2 = l"o,2"''. Then every permutation 
in Il(Di; D2) has exactly no — 1 descents, with each descent corresponding to an occurrence of 2 
immediately before a 1 in the deck Di. In such a case = 0, but typically a"^ will be of the order 
of n. As /i < n always, in such cases, the normal approximation to W given by Theorem IB. 51 will 
have 0(n~^/2) error. The computation of fi and a given Di and D2 will be described presently. 

Calculating the mean and the variance of W. The expectation FiW can be computed by 
summing EXj over 1 < i < n — 1. If Di[i) = a and Di{i + 1) = 6, the expectation EXj is 0.5 if 
a = b. If a 7^ 6, 

R{a,b) 
' N{a)N{b)' 

where N(a) and N(b) are the number of cards with labels a and b in the deck D2, and R{a,b) = 
T.i<i^<la<nX{la,lb) with x{la,k) = 1 if D2{la) = a and L>2(^6) = b and xiLJb) = otherwise. 

The variance Var(VF) can be obtained from K^XiXj) for 1 < i < j < n — 1. The computation of 
these joint expectations involves many cases. First suppose that j > i + 1. Denote Di{i), Di{i + 1), 
Di{j), and Di{j + 1) by a, b, c, and d, respectively. If a = 6, then E(XjXj) = EXj/2. Likewise if 
c = d, E(XjXj) = 'EXi/2. There are seven cases when a ^ b and c ^ d. One of these is when a = d 
and a, b, c are distinct. In that case, 

^ R{a,b)R{c,a) - R{c,a,b) 
^[^t^j) N{a)N{b)N{c){N{a) - 1) ' 

where R{c,a,b) = J2i<i,<ia<k<n-i'x{l'c,la,lb) with xilcLJc) = 1 if -D2(^c) = c, 1)2(^0) = a and 
D2ih) = b but x(^c, ^aj ^c) = otherwise. The other cases are handled similarly. The j = i + 1 case 
is also handled similarly. 



7 Shuffling cards for blackjack and bridge 

The rules for blackjack vary with the gambling house, but the distinction between the four suits 
is always ignored. Most of the time all the face cards are equivalent to cards with the number 10, 
but there are some obscure situations where 10s, jacks, queens, and kings must all be considered 
distinct. We take a blackjack source deck to be any permutation of the multiset {1^, . . . , 13^}. We 
consider two of these — 1^, . . . , 13^ and (1, . . . , 13)^ — which are notable for their symmetry. These 
two will be called Blackjackl and Blackjack2, respectively. Let pi be the transition probability from 
one of these source decks to the ith possible ordering of that deck after a certain number of riffle 
shuffles. Ideally, we would like all of these pi to be equal. 
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The situation for bridge is different. All the 52 cards in the source deck are distinct but there 
are only four players. Each player must be dealt a random set of 13 cards but the order in which 
a player receives his cards is immaterial. 

Suppose the cards are dealt to players AA, £, S, and W in cyclic order, as is the common 
practice. Let the deck which needs to be dealt to the four players be 1, 2, . . . , 52. Let the sets Vj^, 
'Pe-, 'Ps-, and Pyv be a partition of {1, 2, ... , 52} with the cardinality of each set being 13. There 
are 52!/13!^ such partitions. Ideally, we would like the probability that J\f , £, S, and W receive 
cards in Vj\/, Vs, Vs-, and respectively, to be equal for all those partitions. 

Suppose that the probability that AA, <S, 5, and W receive cards in Vj^f, Vs-, Vs-, and "Pyy, 
respectively, is equal to p-p, when the deck 1, 2, . . . , 52 is a-shufSed and then dealt to M , £., S, and 
W is cyclic order. Let the set of permutations of {1,2,..., 52} that result in such a deal be Sp. 
Now consider the deck D with 52 cards such that if i belongs to Vj^f, Vs, Vs-, or Vy^-, the ith card 
of D is M , £, S, or W, respectively. Then the set of permutations Sp equals Ii{D; {Af£SWy^). 
Therefore, pp is equal to the probability that an o-shuffle of D results in {J\f£Syvy^ . Ideally, we 
would like these transition probabilities to be equal for all 52!/13!^ possible decks D. 

In both situations, we have N probabilities Pi, 1 < i < N , which sum to 1, and ideally we would 
like all of them to be 1/A^. Unfortunately, that would require an infinite number of riffle shuffles. 
We have to determine how close the probabilities are to the uniform distribution after a certain 
number of riffle shuffles, and then decide how close is close enough. 

There are many ways to measure the closeness between probability distributions, but the notion 
of closeness must be chosen with care. Some metrics such as the Euclidean distance or the 
norm are inappropriate as the following example shows. Suppose one probability distribution 
always picks the first out of N possibilities and another always picks the second. The Euclidean 
distance between these two distributions is \/2- On the other hand, suppose a distribution picks 
one of the even numbered possibilities with equal likelihood and another distribution picks one of 
the odd possibilities with equal likelihood. The Euclidean distance between the third and fourth 
distributions is 2/y/N + 0{1/N), which incorrectly suggests that these two distributions are much 
closer to each other for large N. 

7.1 A table of total variation distances 

The notion of closeness we use here is the total variation distance. The total variation distance 
from the probability distribution defined by the pi to the uniform distribution is given by 



where = x ii x >0 and x^ = if x < 0. This total variation distance always lies between and 
1. It also has a probabilistic meaning — if Pi{A) is the probability of a certain subset A of the N 
possibilities under the distribution given by the pi and if P2 (A) is the probability of that set under 
the uniform distribution, then the total variation distance equals the maximum of |Pi(A) — P2(^)| 
over all possible A. 

Table El gives the total variation distances from the uniform distribution for nine scenarios 
termed BayerDiaconis, Blackjackl, Blackjack2, Bridgel, Bridge2, RedBlackl, RedBlack2, Alice- 
Bobl, and AliceBoh2- That table allows the number of riffle shuffles (i.e., 2-shuffles) to vary from 
1 to 10. In five of the scenarios, the source deck is fixed and the target deck is allowed to vary. 




(7.1) 
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I 


2 


3 


4 


5 




7 


g 


9 


10 


BayerDiaconis 


1 


1 


1 


1 


.924 


.614 


.334 


.167 


.085 


.043 


Blackjackl 


1 


1 


1 


.481 


.215 /.23 


.105 /.II 


.052 /.05 


.026 /.03 


.013 /.Ol 


.007 /.OO 


Blackjack2 










.60 


.32 


.16 


.08 


.04 


.02 


Bridgel 


1 


1 


1 


.990 


.748/. 75 


.423/.42 


.218/.21 


.llO/.ll 


.055/.05 


.027/.03 


Bridge2 






.45? 


.16 


.08 


.04 


.02 


.01 


.00 


.00 


RedBlackl 


.580 


.360 


.208 


.105 


.052/. 05 


.026/.03 


.013/. 01 


.007/01 


.003/.00 


.002/. 00 


RedBlack2 










.10 


.03 


.01 


.01 


.00 


.00 


AliceBobl 


1 


1 


.999 


.725 


.308 /.31 


.130/. 13 


.059/.06 


.028/.03 


.013/.01 


.007/.01 


AliceBob2 










.02 


.01 


.00 


.00 


.00 


.00 



Table 2: Table of total variation distances after 1 to 10 riffle shuffles for nine scenarios from 
BayerDiaconis to AliccBob2. The numbers in boldface have an error less that .01 with a probability 
greater than 99.9996%. The numbers that are not in boldface were determined using a less accurate 
and heuristic method. Entries of the table with a number in boldface and in ordinary type separated 
by a slash can be used to form an idea of the reliability of the less accurate method. 



21 



These are BayerDiaconis, where the source deck is fixed as Di = 1,2,..., 52; Blackjackl with 
Di = 1^, 2*,..., 13^; Blackjack2 with Z^a = (1, 2, . . . , 13)"^; RedBlackl with Di = R^^B^^; and Red- 
Black2 with Di = {RBf^ . In the other four scenarios the target deck is fixed. These are Bridgel, 
where the target deck is fixed as D2 = AAi3£;i3^i3yyi3. Bridge2 with D2 = (MSSW)^^; AliceBobl 
with D2 = A^'^B^^; and AliceBob2 with D2 = (AB)^^. 

How were the total variation distances shown in Table |2l determined? For BayerDiaconis there 
is a simple and elegant formula for the total variation distance after a certain number of riffle 
shuffles that was derived by Bayer and Diaconis p?. Some of the other numbers require more 
extensive computation. Two numbers for Bridge2, with the number of riffle shuffles being 3 or 4, 
were determined using more than 10000 hours of CPU time on the Teragrid computer network. 
From the point of view of card players, those two numbers are perhaps the most interesting results 
of this paper. First, we explain how the numbers in the rows of Table ^ other than the first 
BayerDiaconis row, were computed, and why they can be trusted. 

7.2 Monte Carlo estimation of total variation distance 

The determination of the numbers in Table [2 is impeded by two problems. The first of these is 
that the summation in (|7.H) has so many terms that it is impractical to use (|7.1|) to find the total 
variation distances shown in Table El 

This first problem is easy to overcome. Denote the sum in (|7.1j) by S. Let Xi be a random 
variable that is equal to [l/N — Pi)~^, the ith term in (|7.H) . with probability 1/N for 1 < i < N. 
Such a random variable can be easily generated if we can determine the transition probabilities 
Pi efficiently. We have EXi = S/N and Var(Xi) < l/N"^. Let Xi, . . . , Xk be independent and 
identically distributed, and let = (Xi + • • • + Xh)N/k. Then 

En = S. (7.2) 

The theorem below tells us how good an estimate of S can be obtained from a single instance of 
the random variable Y^. 

Theorem 7.1. For a > and k > 2, 



Proof. Consider the following calculation. 



E(|n-5|^) = E 



X^-j^]+--- + {X, 



i=l 

< {l/k^ + 3/k^) < 4/A;2 



S_ 

N 




E 



l<i<j<fe 



X 



The second equality above follows from E(Xj — S/N) = and from the independence of Xi and Xj 
for i ^ j- To deduce the first inequality in the last line above, note that Xi has the range [0, 1/A^] 
with EXj = S/N. The proof can be completed using Markov's inequality |^. □ 
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In some special cases that include Blackjackl, Bridgel, RedBlackl, and AliceBobl, we have de- 
rived efficient polynomial time algorithms for finding the descent polynomials (2|- Those algorithms 
and ()3.2p can be used to determine the transition probabilities efficiently. The numbers given for 
these four cases in Table |2l in boldface were obtained using k = 10 random permutations of the 
source deck or the target deck, exact computation of the transition probabilities, and Monte Carlo 
summation (|7.2|) . Theorem 17. II with k = 10^ and a = VTO implies that the boldface numbers have 
errors less than .001 with probability greater than 96%. Theorem 17. II with k = 10^ and a = 10\/T0 
implies that the boldface numbers have errors less than .01 with probability greater than 99.9996%. 

The more significant problem is that there may be no efficient way to determine the transition 
probabilities pi. Indeed there is very probably no efficient method for determining these transition 
probabilities in general, as proved in Section 4. 

7.3 Approximation of the descent polynomial 

In the other four cases — Blackjack2, Bridge2, RedBlack2, and AliceBoh2 — we turn to the Monte 
Carlo method once again to approximate the descent polynomials. Suppose we are given decks Di 
and D2 and it is required to approximate the descent polynomial of permutations in Ii{Di; D2). If 
the decks Di and D2 have ric cards with label c for 1 < c < /i, then the total number of permutations 
in n(Z)i; D2) is ni! . . . n/j!. We generate I random permutations tti, . . . , tt/ from this collection and 
form the polynomial 

I 

p = Y^ ^des(..) ^ + + . . . + 7n-ix"-^ (7.3) 

i=l 

The coefficient jd counts the number of random permutations with d descents. The approximation 
to the descent polynomial is taken to be in other words, the polynomial P given by ()7.3() 

is normalized to get an approximation to the descent polynomial. 

Once the polynomial P defined by 1)7. 3|1 is formed, it carries within itself an estimate of the 
accuracy of the coefficients of the descent polynomial obtained by normalizing P, as we will explain. 

Suppose that X is a Bernoulli random variable with P{X = 1) = p and that p is unknown. 
Suppose that Xi, . . . Xi are independent with the same distribution as that of X, and that in one 
experiment m out of these / random variables equal 1. We can estimate p ~ m/l, but how accurate 
is this estimate? Let Y = {Xi + . . . + Xi)/l. Then EY = p and Var (y) = p{l -p)/l. Therefore the 
fluctuations of Y about its mean are of the order \/p{'\. — p)/l- If we use a single instance of Y to 
estimate its mean, which is p, then we expect an absolute error of about y^p(l — p)/l and a relative 
error of about y^(l — p) /Ip. If we substitute p = m/l, we find that the relative error will be about 
[l — m)/lm. lip is very small, then m « I and we may expect a relative error of about l/-^/m. 

If we define -'^(vr) = 1 if des(7r) = d and X{Tr) = otherwise, where vr is a uniformly distributed 
permutation in I[{Di; D2), then 7^ defined by (|7.3() is Xi + • • • + Xi, where Xi = X{Tri) are 
independent with the same distribution as that of X{tt). Then by the argument of the preceding 
paragraph, the relative error in the estimate of the x"^ coefficient of the descent polynomial will be 
about l/^Td. 

To illustrate this estimate in practice, we take the target deck to be D2 = {N'£SW)^'^ , which is 
fixed for Bridge2, and the source deck Di to be 

NSEENNWEWSSWESWNNNEESSSSSESWWNNSENWSEWSWWWEENEWNNNWE. 
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We computed P, which is defined by ()7.3() . with I = 10^, and got the coefficients of x^^, x^^, and 
to be 17, 397, and 4560, respectively. If the descent polynomial of permutations from Di to 
D2 is approximated as 13!'*P/10^, we expect the relative errors in the coefficients of x^^, x^^, and 
x^^ to be about 25%, 5%, and 1.6%, respectively. When we computed P with / = 10^^, we got the 
coefficients of x^^, x^^, and x^^ to be 2334, 38418, and 468359, respectively. These numbers are 100 
times the counts for / = 10^, if allowance is made for the relative errors when the counts are scaled 
and interpreted as coefficients of the descent polynomial. We conclude that descent polynomials 
can be approximated using (|7.3j) and that the accuracy of the approximation can be gauged by 
looking at the coefficients of P. 

By (|2.1|) . the probability that an a-shuffle leads to a permutation with d descents is equal to 



where (3) is the Eulerian number that counts the number of permutations of 1,2, ... ,n with d 
descents. The Eulerian numbers can be calculated using simple recurrences jl5j . 

The probability that an a-shuffie with a = 32 has 16 or more descents is more than 0.95. 
Coefficients of terms from x^^ to x^^ of descent polynomials that correspond to the scenarios from 
Blackjackl to AliceBob2 in TableEJcan be approximated well using 1)7. 3|) with / = 10''. The numbers 
reported in Table |21 for these scenarios, with the number of riffle shuffles varying from 5 to 10, that 
are not in boldface were computed by approximating descent polynomials in this manner. The 
transition probabilities were computed using these approximate descent polynomials and (|3.2j) . 
and the total variation distances were computed using ()7.2|) with k = 1000. The total variation 
distances computed this way are reported with two digits after the decimal point. They compare 
well with more accurate estimates of the total variation distances, which are found in boldface in 
Table 121 for Blackjackl, Bridgel, RedBlackl, and AliceBobl. 

We still need to explain the computation of the total variation distances for Bridge2 when the 
number of riffle shuffles is 3 or 4. Four riffle shuffles are equivalent to an a-shuffle with a = 16, 
and it is necessary to accurately compute the coefficients of terms from x^^ to x^^ of the descent 
polynomials to find the total variation distance for Bridge2 after four riffie shuffles. Obtaining 
an accurate estimate for the coefflcient of x^^ using (|7.3|) would require an / that is beyond the 
reach of today's computers. We used (|7.3|) with / = lO^'^, and with this I the coefficient of x^^ is 
approximated with a relative error that is less than 5% with a probability greater than 95%. We 
took the logs of the coefficients of the terms from x^^ to x^^ and computed degree 4 polynomials 
that were least squares fits to these numbers. These polynomials were nearly quadratic in accord 
with Theorem 16.51 We got the coefficients of the x^'^, x^^, and x^^ terms by extrapolation. We feel 
sure that the extrapolated coefficients had relative errors smaller than 10%. 

For Bridge2 and three riffie shuffies, we got the coefficients of the x^ terms in the descent 
polynomials using the coefficients of terms from x^^ to x^^ and polynomial fits of degree 6. We are 
less sure that the estimated total variation distance for this case is accurate. 

8 Three open problems 

1. The first of the three open problems that we mention in this section is about a more general 
model of riffle shuffles. In the model of riffle shuffles we have used so far a random riffle shuffle of 
a deck of n cards is obtained by first generating a random sequence of n numbers. The numbers in 




1 /a + n — d— 1 



n 




24 



the sequence are independent of each other and each number is either 1 or 2 with probability 1/2. 
The model can be changed by requiring the first number in the sequence to be either 1 or 2 with 
probability 1/2. Every later number in the sequence equals the preceding number with probability 

1 — p and it is of the opposite kind with probability p. This model is described by Aldous [l] and 
Diaconis [S]. 

When p = 1/2, we get back the GSR-model which was described in Section 2. If p > 1/2, 
then the riffle shuffles are neat — if a card is dropped from one hand the next card is likelier to be 
dropped from the other hand. If p < 1/2, the shuffling is clumsy. The problem is to determine how 
the mixing times depend upon p. 

Any sequence of Is and 2s where every 1 occurs before every 2 corresponds to the identity 
permutation. All other sequences correspond to distinct permutations. The probability of one of 
these permutations under this model will be 0.5 * p^{l — p^^-^-^ if there are k places where a 

2 follows a 1 or a 1 follows a 2 in the corresponding sequence. The probability of the identity 
permutation is also easy to determine. Suppose we want to know the probability that a given 
permutation tt can be obtained as the composition of m riffle shuffles none of which is the identity. 
This number will be a polynomial in p of the form 

m(n— 1) 
fe=0 

with the coefflcients Ck depending upon vr and m. A good place to start might be by asking if the 
coefflcients can be determined in polynomial time. 

The value of this polynomial when p = 1/2 can be deduced from the work of Bayer and 
Diaconis 0. When p = 1, each riffle shuffle is either a perfect in-shuffle or a perfect out-shuffle 
with probability 1/2. Much information about this case can be found in the work of Diaconis, 
Graham, and Kantor |lUj . 

2. The second problem is purely combinatorial and is related to the riffle shuffles of the deck (1, 2)" 
[2]. Consider all sequences of length 2n with n Is and n 2s. Define any two sequences Di = a(3^ 
and D2 = a(3*"f to be i?-related if (3 has the same number of Is as 2s and if (5* is obtained from (3 by 
changing Is to 2s and 2s to Is. This is an equivalence relation. We conjecture that the number of 
equivalence classes is (n-|-3)2"~^. If the equivalence relation is modified by requiring a and 7 to be 
sequences of the same length, the number of equivalence classes is the Catalan number (^n+i) 

m 

3. The third problem too is purely combinatorial. Consider all permutations tt of the numbers 
1, . . . , nh such that ■n{i) = i mod h for 1 < i < nh. The problem is to derive a recurrence for the 
number of these permutations that have exactly d descents. If /i = 1, the familiar recurrences for 
the Eulerian numbers (see |15j ) solve this problem. A solution of this problem will make it possible 
to compute the transition probability from the deck (1,2,..., h)"' to itself under an a-shuffle. 
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